Advanced partitions

This page documents the technical partition policy enforced by Slurm and job_submit.lua.

Partition policy

Partition Accepted job mode Enforced GPU/GRES type Default ntasks Default cpus-per-task CPU cap (ntasks * cpus-per-task) Default memory (DefMemPerNode) Max memory (MaxMemPerNode) Default time Max time
interactive10 srun gpu:nvidia_a100_1g.10gb:1 1 4 4 16G 16G partition default 2h
prod10 sbatch gpu:nvidia_a100_1g.10gb:1 1 4 4 15G 15G 4h 24h
prod40 sbatch gpu:nvidia_a100_3g.40gb:1 1 16 16 60G 60G 4h 24h
prod80 sbatch gpu:nvidia_a100-sxm4-80gb:1 1 32 32 120G 120G 4h 24h

What is QoS (Quality of Service)?

In Slurm, QoS (Quality of Service) is a policy profile attached to jobs/users/accounts. It is used to control scheduling behavior such as limits (for example maximum number of concurrent jobs), priority and preemption rules. On this cluster, QoS is used to enforce part of the job concurrency policy.

What job_submit.lua enforces

  • A partition must be provided (-p/--partition).
  • Missing --gres is auto-filled from partition policy.
  • Missing --ntasks is auto-filled to 1.
  • Missing --cpus-per-task is auto-filled from partition policy.
  • CPU requests above partition policy are rejected.
  • Memory defaults and caps come from partition DefMemPerNode / MaxMemPerNode.
  • prod* partitions reject interactive submissions (srun); use sbatch.
  • For jobs in QoS normal (or empty QoS), at most 4 running jobs are allowed in total.
  • For jobs in QoS normal (or empty QoS), at most 2 running jobs are allowed across prod40 + prod80.

If your workload needs more resources or higher limits, contact support: dgx_support@listes.centralesupelec.fr.

Simplified vs explicit submissions

Simplified (recommended first):

srun -p interactive10 --time=00:30:00 --pty bash
sbatch -p prod10 --time=04:00:00 --wrap="python3 train.py"

Explicit (advanced override):

sbatch -p prod40 --gres=gpu:nvidia_a100_3g.40gb:1 --ntasks=1 --cpus-per-task=16 --time=08:00:00 train.sbatch

Use explicit settings only when you need to override defaults for a specific workload.

Technical rationale: memory is budgeted with system headroom on the DGX node (about 32G reserved), then split by GPU class for production partitions.